Goto

Collaborating Authors

 testing ml system


Practical AI #74: Testing ML systems with Tania Allard, developer advocate at Microsoft

#artificialintelligence

I can say I've been working across the machine learning pipeline in all the different roles… And as you mentioned, a lot of these roles are very [unintelligible 00:05:29.20] When people talk about data scientist, and data engineering roles in machine learning research, or machine learning engineering rather, they try to use these Venn diagrams… And I've found that it is not very descriptive. For example, if you're working on the data science side of the pipeline, you're focusing much more on the statistics, on developing novel algorithms or models that would help your business or your company to get [unintelligible 00:06:03.07] But then you will probably have/need some software engineering skills as well, to take that into a production format with the rest of your dev environment or your dev team… Whereas when you're working on the data engineering side of things, you're focusing much more on all the processes that are [unintelligible 00:06:23.24] And then the machine learning engineer role is basically the one that binds it all together.


Machine Learning : Few rarely shared trade secrets

@machinelearnbot

If there are n number of instances in data, probability of'success' is 1/n and for the failure, its (n-1)/n. In the specific case of a bootstrap sample, the sample size b equals the number of instances n. Thus the probability of the instance being selected atleast once is 1-1/e 0.632 Grid search is computationally expensive as it checks for all the possible combinations of the parameters specified and evaluates on the same. Lets say if two parameters are A and B, and the possible ranges specified are 0-2 and 0-3 respectively; The possible combinations in the parameter space in case of grid search would be (0,0) (0,1) (0,2) (0,3) ...........(2,2) (2,3). Although grid search can be made to run in parallel, still the technique is not computationally very efficient .


Machine Learning : Few rarely shared trade secrets

#artificialintelligence

If there are n number of instances in data, probability of'success' is 1/n and for the failure, its (n-1)/n. In the specific case of a bootstrap sample, the sample size b equals the number of instances n. Thus the probability of the instance being selected atleast once is 1-1/e 0.632 Grid search is computationally expensive as it checks for all the possible combinations of the parameters specified and evaluates on the same. Lets say if two parameters are A and B, and the possible ranges specified are 0-2 and 0-3 respectively; The possible combinations in the parameter space in case of grid search would be (0,0) (0,1) (0,2) (0,3) ...........(2,2) (2,3). Although grid search can be made to run in parallel, still the technique is not computationally very efficient .